{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_dEaVsqSgNyQ"
      },
      "source": [
        "##### Copyright 2021 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4FyfuZX-gTKS"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sT8AyHRMNh41"
      },
      "source": [
        "# Recommend movies for users with TensorFlow Ranking\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/ranking/tutorials/quickstart\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/ranking/blob/master/docs/tutorials/quickstart.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/ranking/blob/master/docs/tutorials/quickstart.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/ranking/docs/tutorials/quickstart.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8f-reQ11gbLB"
      },
      "source": [
        "In this tutorial, we build a simple two tower ranking model using the [MovieLens 100K dataset](https://grouplens.org/datasets/movielens/100k/) with TF-Ranking. We can use this model to rank and recommend movies for a given user according to their predicted user ratings."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qA00wBE2Ntdm"
      },
      "source": [
        "## Setup\n",
        "\n",
        "Install and import the TF-Ranking library:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6yzAaM85Z12D"
      },
      "outputs": [],
      "source": [
        "!pip install -q tensorflow-ranking\n",
        "!pip install -q --upgrade tensorflow-datasets"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "n3oYt3R6Nr9l"
      },
      "outputs": [],
      "source": [
        "from typing import Dict, Tuple\n",
        "\n",
        "import tensorflow as tf\n",
        "\n",
        "import tensorflow_datasets as tfds\n",
        "import tensorflow_ranking as tfr"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zCxQ1CZcO2wh"
      },
      "source": [
        "## Read the data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A0sY6-Rtt_Co"
      },
      "source": [
        "Prepare to train a model by creating a ratings dataset and movies dataset. Use `user_id` as the query input feature, `movie_title` as the document input feature, and `user_rating` as the label to train the ranking model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "M-mxBYjdO5m7"
      },
      "outputs": [],
      "source": [
        "%%capture --no-display\n",
        "# Ratings data.\n",
        "ratings = tfds.load('movielens/100k-ratings', split=\"train\")\n",
        "# Features of all the available movies.\n",
        "movies = tfds.load('movielens/100k-movies', split=\"train\")\n",
        "\n",
        "# Select the basic features.\n",
        "ratings = ratings.map(lambda x: {\n",
        "    \"movie_title\": x[\"movie_title\"],\n",
        "    \"user_id\": x[\"user_id\"],\n",
        "    \"user_rating\": x[\"user_rating\"]\n",
        "})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5W0HSfmSNCWm"
      },
      "source": [
        "Build vocabularies to convert all user ids and all movie titles into integer indices for embedding layers:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9I1VTEjHzpfX"
      },
      "outputs": [],
      "source": [
        "movies = movies.map(lambda x: x[\"movie_title\"])\n",
        "users = ratings.map(lambda x: x[\"user_id\"])\n",
        "\n",
        "user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(\n",
        "    mask_token=None)\n",
        "user_ids_vocabulary.adapt(users.batch(1000))\n",
        "\n",
        "movie_titles_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(\n",
        "    mask_token=None)\n",
        "movie_titles_vocabulary.adapt(movies.batch(1000))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zMsmoqWTOTKo"
      },
      "source": [
        "Group by `user_id` to form lists for ranking models:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lXY7kX7nOSwH"
      },
      "outputs": [],
      "source": [
        "key_func = lambda x: user_ids_vocabulary(x[\"user_id\"])\n",
        "reduce_func = lambda key, dataset: dataset.batch(100)\n",
        "ds_train = ratings.group_by_window(\n",
        "    key_func=key_func, reduce_func=reduce_func, window_size=100)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "57r87tdQlkcT"
      },
      "outputs": [],
      "source": [
        "for x in ds_train.take(1):\n",
        "  for key, value in x.items():\n",
        "    print(f\"Shape of {key}: {value.shape}\")\n",
        "    print(f\"Example values of {key}: {value[:5].numpy()}\")\n",
        "    print()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YcZJf2qxOeWU"
      },
      "source": [
        "Generate batched features and labels:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ctq2RTOqOfAo"
      },
      "outputs": [],
      "source": [
        "def _features_and_labels(\n",
        "    x: Dict[str, tf.Tensor]) -\u003e Tuple[Dict[str, tf.Tensor], tf.Tensor]:\n",
        "  labels = x.pop(\"user_rating\")\n",
        "  return x, labels\n",
        "\n",
        "\n",
        "ds_train = ds_train.map(_features_and_labels)\n",
        "\n",
        "ds_train = ds_train.apply(\n",
        "    tf.data.experimental.dense_to_ragged_batch(batch_size=32))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RJUU3mv-_VdQ"
      },
      "source": [
        "The `user_id` and `movie_title` tensors generated in `ds_train` are of shape `[32, None]`, where the second dimension is 100 in most cases except for the batches when less than 100 items grouped in lists. A model working on ragged tensors is thus used."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GTquqk1GkIfd"
      },
      "outputs": [],
      "source": [
        "for x, label in ds_train.take(1):\n",
        "  for key, value in x.items():\n",
        "    print(f\"Shape of {key}: {value.shape}\")\n",
        "    print(f\"Example values of {key}: {value[:3, :3].numpy()}\")\n",
        "    print()\n",
        "  print(f\"Shape of label: {label.shape}\")\n",
        "  print(f\"Example values of label: {label[:3, :3].numpy()}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Lrch6rVBOB9Q"
      },
      "source": [
        "## Define a model\n",
        "\n",
        "Define a ranking model by inheriting from `tf.keras.Model` and implementing the `call` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e5dNbDZwOIHR"
      },
      "outputs": [],
      "source": [
        "class MovieLensRankingModel(tf.keras.Model):\n",
        "\n",
        "  def __init__(self, user_vocab, movie_vocab):\n",
        "    super().__init__()\n",
        "\n",
        "    # Set up user and movie vocabulary and embedding.\n",
        "    self.user_vocab = user_vocab\n",
        "    self.movie_vocab = movie_vocab\n",
        "    self.user_embed = tf.keras.layers.Embedding(user_vocab.vocabulary_size(),\n",
        "                                                64)\n",
        "    self.movie_embed = tf.keras.layers.Embedding(movie_vocab.vocabulary_size(),\n",
        "                                                 64)\n",
        "\n",
        "  def call(self, features: Dict[str, tf.Tensor]) -\u003e tf.Tensor:\n",
        "    # Define how the ranking scores are computed: \n",
        "    # Take the dot-product of the user embeddings with the movie embeddings.\n",
        "\n",
        "    user_embeddings = self.user_embed(self.user_vocab(features[\"user_id\"]))\n",
        "    movie_embeddings = self.movie_embed(\n",
        "        self.movie_vocab(features[\"movie_title\"]))\n",
        "\n",
        "    return tf.reduce_sum(user_embeddings * movie_embeddings, axis=2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BMV0HpzmJGWk"
      },
      "source": [
        "Create the model, and then compile it with ranking `tfr.keras.losses` and `tfr.keras.metrics`, which are the core of the TF-Ranking package. \n",
        "\n",
        "This example uses a ranking-specific **softmax loss**, which is a listwise loss introduced to promote all relevant items in the ranking list with better chances on top of the irrelevant ones. In contrast to the softmax loss in the multi-class classification problem, where only one class is positive and the rest are negative, the TF-Ranking library supports multiple relevant documents in a query list and non-binary relevance labels.\n",
        "\n",
        "For ranking metrics, this example uses in specific **Normalized Discounted Cumulative Gain (NDCG)** and **Mean Reciprocal Rank (MRR)**, which calculate the user utility of a ranked query list with position discounts. For more details about ranking metrics, review evaluation measures [offline metrics](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Offline_metrics)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "H2tQDhqkOKf1"
      },
      "outputs": [],
      "source": [
        "# Create the ranking model, trained with a ranking loss and evaluated with\n",
        "# ranking metrics.\n",
        "model = MovieLensRankingModel(user_ids_vocabulary, movie_titles_vocabulary)\n",
        "optimizer = tf.keras.optimizers.Adagrad(0.5)\n",
        "loss = tfr.keras.losses.get(\n",
        "    loss=tfr.keras.losses.RankingLossKey.SOFTMAX_LOSS, ragged=True)\n",
        "eval_metrics = [\n",
        "    tfr.keras.metrics.get(key=\"ndcg\", name=\"metric/ndcg\", ragged=True),\n",
        "    tfr.keras.metrics.get(key=\"mrr\", name=\"metric/mrr\", ragged=True)\n",
        "]\n",
        "model.compile(optimizer=optimizer, loss=loss, metrics=eval_metrics)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NeBnBFMfVLzP"
      },
      "source": [
        "## Train and evaluate the model\n",
        "\n",
        "Train the model with `model.fit`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bzGm7WqSVNyP"
      },
      "outputs": [],
      "source": [
        "model.fit(ds_train, epochs=3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "V5uuSRXZoOKW"
      },
      "source": [
        "Generate predictions and evaluate."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6Hryvj3cPnvK"
      },
      "outputs": [],
      "source": [
        "# Get movie title candidate list.\n",
        "for movie_titles in movies.batch(2000):\n",
        "  break\n",
        "\n",
        "# Generate the input for user 42.\n",
        "inputs = {\n",
        "    \"user_id\":\n",
        "        tf.expand_dims(tf.repeat(\"42\", repeats=movie_titles.shape[0]), axis=0),\n",
        "    \"movie_title\":\n",
        "        tf.expand_dims(movie_titles, axis=0)\n",
        "}\n",
        "\n",
        "# Get movie recommendations for user 42.\n",
        "scores = model(inputs)\n",
        "titles = tfr.utils.sort_by_scores(scores,\n",
        "                                  [tf.expand_dims(movie_titles, axis=0)])[0]\n",
        "print(f\"Top 5 recommendations for user 42: {titles[0, :5]}\")"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "quickstart.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
